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Abstract 

The montecarlo method, which is quite commonly used to solve maximum entropy problems in 
statistical physics, can actually be used to solve inverse problems in a much wider context. The 
probability distribution which maximizes entropy can be calculated analytically by introducing 
Lagrange parameters. The problem of fixing these lagrangean parameters is circumvented by 
introduction of a micro canonical ensemble which describes a system together with its heath bath. 
Some further simplifying assumptions make it feasible to do montecarlo sampling of the probability 
distribution. The method is applied to the example of determining the distribution of the density 
of the earth from three data. Advantages of the method are guaranteed convergence and a clear 
information-theoretic foundation. 
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Maximum entropy has become one of the dominant methods for solving inverse problems 
of the underdetermined type. The equilibrium probability distribution function (pdf ) p of a 
canonical ensemble of classical statistical physics is known |I|] to be the solution of an inverse 
problem: It maximizes entropy S{p) given some constraints, e.g. given that the expectation 
value (H) of the energy functional H{x) 



has some predetermined value U. Now, montecarlo simulation is a well established numer- 
ical method, used in models of statistical physics |Q to sample the equilibrium pdf. The 
strength of the method follows from Markov chain properties which imply that, under the 
mild condition of ergodicity of the model, the simulation results converge to the exact result. 



tions generated during the simulation are typical solutions of the inverse problem of finding 
configurations which satisfy the given constraints. The present paper shows that this com- 
bination of maximum entropy and montecarlo simulation, which works so well in statistical 
physics, can be applied to a more general class of inverse problems. 

The maximum entropy method is often used to determine a density function p defined 
over some index set /. A toy example is the prediction of the density of the earth as a 
function of the distance r to the center of the earth from three data: mass M, radius R and 
moment of inertia J. Other examples are the restoration of images and computer-assisted 
tomography (CT). See In such cases it is common practice to interpret p, after suitable 
normalization, as a pdf which can be determined by the maximum entropy method. This 
is not the approach which is followed here. The alternative requires to consider pdfs on the 
abstract space X of all possible density distributions p. At first sight the latter approach 
may seem impractical for numerical evaluation because of the huge number of degrees of 
freedom that can be involved. However, by means of montecarlo simulation it is feasible to 
sample the pdf so that its average and covariance can be determined numerically without 
ever having to evaluate actual probabilities. At each moment only one density distribution 
is stored in the memory of the computer. 

The formal solution of the maximum entropy problem can be obtained analytically by 
introducing Lagrange parameters. A remaining problem is that of determining these lan- 
grangean parameters They have to be tuned so that constraints of the type {H) = U are 




(1) 



In addition, because of the equipartition theorem^, |^, one can expect that the configura- 
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satisfied. It is shown below that this problem can be avoided by solving the maximum en- 
tropy problem in the microcanonical ensemble instead of in the canonical or grand canonical 
ensemble. 

The main advantage of the new method is that it is based on clean theoretical con- 
cepts. The disadvantage is that computation times can be large. Therefore the method is 
complementary to existing techniques, mostly of iterative nature, which are fast but not 
optimal. A solution obtained with any of these techniques can be taken as starting point 
for improvement by montecarlo simulation, e.g. to eliminate artifacts from reconstructed 
images. 

In order to fix notations, consider the general problem of a classical experiment consisting 
of a number of measurements. The experiment is modeled as a function / from some space X 
of physical variables into some space V of all possible outcomes of the experiment. Given the 
experimental outcome v, one can define a pdf p on X indicating with which probability points 
of X can give rise to the experimental outcome v. This pdf is usually non- unique because the 
experiment yields only a finite amount of information. At this point the maximum entropy 
principle comes into play: From all pdfs that are compatible with the experimental outcome 
one should select the one which maximizes entropy. 

The entropy of the pdf p is given by 



It has to be maximized under the constraints that the averages (fj) equal the experimental 
data Vj, for j = 1,2, ■ ■ ■ , N. It is straightforward to show that this leads to the result 

with lagrangean parameters 7j, one for each component fj of /. These 7j have to be chosen 
such that the constraints {fj) — Vj hold. The latter can be a hard problem. It is avoided 
below by reformulating the problem in a different ensemble. 

The description of a mechanical system in canonical or grand canonical ensemble is known 
to be equivalent to the description of a system interacting with its environment. Let us 
therefore postulate that a pdf p{x, u) exists which combines the state x of the system and 
the noise uu of the environment. Both p and the pdf /j, of the environment can be derived 




(2) 
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from p by 

p{x) = I dujp{x,uj) (4) 



n 



and 

fi{uj) = I dxp{x^uj) (5) 



X 

The outcome f{x, u) of the experiment depends now on both the state of the system and the 
noise of the environment. The maximum entropy principle of the microcanonical ensemble, 
containing both system and environment, states that p should be varied in such a way that 
the entropy is maximal under the constraints that (|^) is satisfied and that f{x,uj) = v. 

Let us now assume that f{x,uj) = v has a unique solution as an equation in u. Denote 
it uj{x,v). Then the constraint f{x,uj) = v implies that 

p{x, U) = p{x)6uj{x,v) (^) (6) 

{6a is the distribution concentrating in the point a). A straightforward calculation gives 

p(x) = ie-^^^^^'")) (7) 

Zj 

with Z a normalization factor, and with lagrangean parameters one for each possible 

value of the noise lo. The parameters /5(co') have to be chosen in such a way that holds. 
This can be done easily. Note that (H) is fulfilled by construction. Using (||, ^ equation 
becomes 

/.H = e-^(-)la.H (8) 

with 

cr„(cc;) = \^ da;(5^(a;,„)(cc;) (9) 

There follows 



X 



. (10) 

In this result the pdf p is unknown, but is assumed to be fixed by the experimental envi- 
ronment. 

Assume now that i) the noise space coincides with the space V of outcomes, ii) the 
function / is of the form 

f{x,ix!) = g{x)^ijj (11) 



iii) /i is of the form 

fiiu) = {2nr'/' n cr-^e-(i/2HV-.^ (12) 
with parameters o"i ■ ■ ■ 0"^ controlUng the amount of noise. Let 

0,H= / dyl[5ig,iy)-w,) (13) 

with 6 Dirac's delta-function. Then one has ay{uj{x, v)) = (f)g{g{x)). Using uj{x, v) = v — g{x) 
one obtains from (0) our main resuh 

with 

Hi^) = lj:iv^-9A^)r/^' (15) 

The quantity (x) is the analog of the energy of statistical mechanics, the quantity log 4>g{w) 
is the entropy, in the sense of Boltzmann, of the macrostate consisting of all physical states 
y for which g{y) = w. 

The measurement functions gi, - ■ ■ ,gk can be completed with functions gk+i, ■ ■ ■ ,gN in 
such a way that together the functions gj,j = 1 ■ ■ ■ N form a new coordinate frame for the 
space X. It is now straightforward to calculate moments of the measurement functions 

{gj^) = dxp{x)g^{x) 

dg, (7r(27ra,)-i/2g-(i/2)(^^-*)'/'^l (16) 

This means that the pdf (|14D is such that each of the variables gj, j = 1 ■ ■ ■ k, has a normal 
distribution with average Vj and spread aj. 

Let us discuss in what follows how (|14D can be sampled using montecarlo simulation. 



Due to the quadratic nature of the hamiltonian H the only contributions to ([I^ come from 
points X such that g{x) is close to v (let us assume to this point that the uncertainties aj 
are small so that H becomes large if g{x) is not close to v). Hence, to zeroth order the term 
— \og(f)g{g{x)) in (0) is constant and can be neglected. For each update x — >• x', considered 
during execution of the montecarlo algorithm, one has to evaluate the change in energy 

AH = H{x') - H{x) 

= \ll iaA^') - 9A^)) [aA^') + aA^) - 2^.] (17) 
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The update is accepted if a random number r, uniformly distributed in [0, 1], is smaller than 
exp{—AH). The algorithm makes sense when the calculation of g{x') — g{x) can be done 
efficiently because then also AH can be obtained efficiently. An estimate of the relative 
vector g{x) — v can be maintained during the simulation by adding g{x') — g{x) to g{x) — v 
whenever the update is accepted. The estimate can be kept accurate by explicitly evaluating 
g{x) — V at regular time intervals. 

The main effect of the zeroth order approximation, assuming — log (f)g{g{x)) to be constant 
in (14), is a violation of ([161) . In particular the average of g{x), when calculated with the 
pdf obtained by the montecarlo simulation, will not coincide with v. Let us show how one 
can correct this deficiency. An expansion to first order gives 

log 09 (^(x)) ~ log(f)g{v) + ag{v).{g{x) - v) + ■ ■ ■ (18) 

with ag{v) = (f)g{v)~^'V(f)g{v). Introduce a new hamiltonian H' by 

H\x) = H{x) + \og(Pg{g{x)) 
1 ^ 

^ - ^(t;^- — (7j(x))^/(7| + constant terms (19) 

with v'j = Vj — {ag{v))j(Tj. It is easy to estimate v' — v numerically. Indeed, from the pdf 
obtained by montecarlo simulation one can calculate the average {g{x)) and the deviation 
w from the target v, i.e. 

w={g{x))-v (20) 

Then a good guess is v' = w — v. The montecarlo simulation can now be continued with v 
replaced by v'. In this way the error can be reduced to second order. 

Let us shortly discuss the example of calculating the density of the earth as a function of 
the distance r to the center of the earth from three data: mass M, radius R and moment 
of inertia J. The experimental data are SM/AnR^ = 5517 ± 5 kg/m^ and IbJ/AnR^ = 
0.84 X 5517 = 4634.28 ± 55 kg/m^ - the error bars are simple guesses made by the author. 
The sphere with radius R is divided into N shells of equal volume. was chosen equal to 
100 as in [^. The average density of the n-th shell is denoted pn- It satisfies 

^ 3M 



EPnK-(--l)^/^) = N'/'i^^ (21) 



n=l 




FIG. 1: Density function of shell number n 

The simulation is started with a uniform mass distribution. Two types of updates are used, 
an exchange of density of two randomly chosen shells, and an increment /decrement of the 
density of a randomly selected shell. The two update technique occur with equal probability. 
The increment /decrement is randomly selected from an interval with self-adapting bound- 
aries: the interval grows by a factor of 1.1 on success, and shrinks by a factor of 0.95 on 
failure. The simulation times have been chosen excessively large to stress the advantage of 
the present method that the convergence can only improve. First, 10,000 montecarlo steps 
(mcs) are used to be sure that the configuration of densities is typical (as usual one mcs 
contains update trials). The next 30,000 mcs are used to calculate the averages {g{x)) 
and the deviation w given by (pOl). The result is 

{giix)) = 5517.19, {g^ix)) = 4673.19 (22) 

Then the target v is replaced by v' and the simulation runs for 100,000 mcs to determine 
the average densities (p„) - see fig. 1. Due to the correction from v to v' the relations (p!6D 
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are better satified. Indeed, one obtains 

{gi{x)) = 5516.97, {g2{x)) = 4635.59 (23) 

The predicted density in the centre of the earth is about 12,460 kg/m^, at the surface about 
3,400 kg/m^. These values should be compared with the results of 11,200 or 13,600 
kg/m^ in the centre of the earth, depending on the method being used, and 3,250 kg/m^ at 
the surface. 

One concludes from the example that the new method works. It can compete with 
existing numerical techniques on two grounds: i) Absolute convergence. Most real world 
inverse problems are ergodic, in which case the montecarlo simulation can only improve 
with increasing computing time. In most iterative methods results start to deteriorate when 
the computations are not stopped in time, ii) Clear interpretation of the results. What 
one calculates is the average and variation of that distribution of density functions which 
maximizes entropy and hence contains the least information. From an information-theoretic 
point of view this is the best one can do. Of course, any method has its limitations. They 
will show up when the method is tried to a variety of problems. In particular, one can expect 
that in problems of image reconstruction montecarlo simulation will be useful to improve 
solutions obtained by interative methods. Much can be learned from the experience with 
montecarlo simulations acquired in statistical mechanics. In particular, it is obvious that 
efficient techniques for updating configurations are essential for improving computational 
speed. 
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